-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in Search API show fileCount for datasets #6601 #6623
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to ask about the decision not to index the number.
Are we always guaranteed to have the dataset/datasetversion db objects instantiated? SearchServiceBean.search() has the boolean retrieveEntities; are we ever calling it with this set to FALSE? If it does get called that way, and the entity is not instantiated, the code in this PR will be safe, it looks like (it will be skipped, so no null pointer exception...)
I guess my question is, are we ok with the fact that the fileCount field is not going to be available if search() is called with the "do not instantiate" option? Is it ever called that way?
[I guess the answer is, we DO always instantiate the entity when search is called in the API context; and this issue is for adding fileCont to the API output only - and not to the search cards on the page?]
Instantiating entity objects IS expensive; I recall we made an effort in the past to keep everything we needed in the search result object indexed, and to avoid instantiating entities, or at least to avoid using unoptimized .find() methods to do so. It appears that we have abandoned that approach since - and it may be worth revisiting. (This is of course outside the scope for this issue!)
Yes, we are. Here's how we talk about this in https://github.com/IQSS/dataverse/releases/tag/v4.19 'The boolean parameter query_entities has been removed from the Search API. The former "true" behavior of "whether entities are queried via direct database calls (for developer use)" is now always true.' This change was introduced in pull request #6441 |
OK, I remember it now.
|
Probably we should have indexed the number (of files, |
What this PR does / why we need it:
Search API users have requested counts of files in datasets.
Which issue(s) this PR closes:
Closes #6601
Special notes for your reviewer:
Despite the noise I made in the issue about reindexing, I went for a database hit, nestled in several lines of other database hits. We already have the datasetVersion in our hands. It seemed pretty quick with my local testing, admittedly only a few files.
Suggestions on how to test this:
Look for "fileCount" in the response for datasets. It might be nice to test a dataset with many files to see if it's slow. A zero is returned, as expected, with no files.
Does this PR introduce a user interface change?:
No.
Is there a release notes update needed for this change?:
I didn't bother since it seems like a small feature. I'm reminded that some day we might want to keep an API change log. See also http://guides.dataverse.org/en/4.19/api/faq.html#is-there-a-changelog-of-api-functionality-that-has-been-added-over-time
Additional documentation:
None. I did update the JSON examples in the Search API docs to show "fileCount".